3D Computer Vision (2024/25)¶
Exercise¶
Upload: 30.10.2024 (11:30)
Deadline: 16.01.2025 (23:59)
Our Group¶
Submitted by Group 14:
- Saptarshi Bhattacharya
- Rohan Gadgil
- Anushree Bajaj
- Pushkar Santosh Marathe
By submitting this exercise, you confirm the following:
- All people listed above contributed to this solution
- No other people were involved in this solution
- No contents of this solution were copied from others (this includes people, large language models, websites, etc.)
Submission¶
Please hand in a single .zip file named according to the pattern "groupXX" (e.g. group00). The contents of the .zip should be as follows:
- folder with the same name as the .zip file
- .ipynb file
- .html export of .ipynb with all the outputs you got
- data folder containing necessary files to run the code
I.e.
- unzip the provided project.zip file
- rename folder "project" according to the pattern "groupXX"
- solve task inside .ipynb file
- export notebook as .html (File > Download as > HTML)
- zip folder groupXX
- submit groupXX.zip
Final Presentation¶
You will be required to present your solution in a 20 minute presentation, which includes:
- Problem Overview
- Solution Overview (e.g. pseudo code, mathematical formulas, visualizations)
- Describe challenges & optimizations
After the presentation, there will be 10 minutes of questions and answers about your work.
3D Scene Reconstruction¶
Task Overview¶
Your task in this exercise is to do a dense reconstruction of a scene. This will involve multiple steps that you will encounter and learn about as the semester progresses. You can start implementing individual steps as soon as you learn about them or wait until you have learned more to implement everything together. In the latter case, be mindful that this exercise is designed for an entire semester and the workload is accordingly large.
You will be given the following data:
- 9 color images of the scene.
- 8 Bit RGB per pixel.
- Each image rendered from a different position.
- The camera used had lens distortion.
- 9 Depth images of the scene.
- 8 Bit Grayscale per pixel. The result of dividing the Z-depth by each image's maximum and then multiplying by 255.
- Each image has the same pose as the corresponding RGB image.
- The camera used was free of any distortions.
- 1 Dictionary containing camera calibration parameters.
- They belong to the camera that was used to render the RGB images.
- Distortion coefficients are given in the standard [k1, k2, p1, p2, k3] order.
- 1 Numpy array containing 8 camera transformations.
- They specify the movements that the camera went through to render all images.
- I.e. idx 0 specifies the transformation from 00.png to 01.png, idx 1 specifies the transformation from 01.png to 02.png, ...
- This applies to both RGB and Depth images, as they have the same poses.
- 1 Numpy array containing 7 features.
- The features are specified for each of the 9 images.
- Each feature is a 2D pixel location in "H, W" order, meaning the first value is the height/row in the image and the second width/column.
- If a feature was not visible, it was entered as [-1, -1].
- The features are unsorted, meaning that feature idx 0 for 00.png could be corresponding to e.g. feature idx 4 for 01.png.
Solution requirements¶
- Your code needs to compile, run, and produce an output.
- Your target output should be a dense point cloud reconstruction (without holes) of the scene.
- The output should be in the .ply format. We provide a function that can exports a .ply file.
- You may inspect your .ply outputs in e.g. Meshlab.
- See the 'Dense Point Cloud' sample image to get an idea of what is possible. (Meshlab screenshot with point shading set to None)
- Your code should be a general solution.
- This means that it could run correctly for a different dataset (with same input structure).
- It should NOT include anything hardcoded specific to this dataset.
- Your code should not be unnecessarily inefficient.
- Our sample solution runs in less than 2 minutes total (including point cloud export).
- If your solution runs for more than 10 minutes, you are being wasteful in some part of your program.
Imports¶
Please note the following:
- These are all imports necessary to achieve the sample results.
- You may remove and/or add other libraries at your own convinience.
- Using library functions (from the given or other libraries) that bypass necessary computer vision tasks will not be recognized as 'solved'.
- E.g.: If you need to undistort an image to get to the next step of the solution and use the library function cv2.undistort(), then we will evaluate the undistortion step as 'failed'.
- E.g.: If you want to draw points in an image (to check your method or visualize in-between steps) and use the library function cv2.circle(), then there is no problem.
- E.g.: If you need to perform complex mathematical operations and use some numpy function, then there is no problem.
- E.g.: You do not like a provided utility function and find/know a library function that gives the same outputs from the same inputs, then there is no problem.
import os
import torch
import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
#make plots interactive:
%matplotlib inline
Prepare Data¶
This should load all available data and also create some output directories. Feel free to rename variables or add additional directories as you see fit.
#Inputs
base_path = os.getcwd()
data_path = os.path.join(base_path, f"data")
img_path = os.path.join(data_path, 'images')
depth_path = os.path.join(data_path, 'depths')
print(f"The project's root path is '{base_path}'.")
print(f"Reading data from '{data_path}'.")
print(f"Image folder: '{img_path}'.")
print(f"Depth folder: '{depth_path}'.")
#Outputs
out_path = os.path.join(base_path, 'output')
ply_path = os.path.join(out_path, 'point_cloud')
os.makedirs(out_path, exist_ok=True)
os.makedirs(ply_path, exist_ok=True)
print(f"\nCreating directory '{out_path}'.")
print(f"Creating directory '{ply_path}'.")
#Load Data
camera_calibration = np.load(os.path.join(data_path, 'camera_calibration.npy'), allow_pickle=True)
camera_calibration = camera_calibration.item()#get dictionary from numpy array struct
given_features = np.load(os.path.join(data_path, 'given_features.npy'), allow_pickle=True)
camera_movement = np.load(os.path.join(data_path, 'camera_movement.npy'), allow_pickle=True)
The project's root path is '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise'. Reading data from '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/data'. Image folder: '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/data/images'. Depth folder: '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/data/depths'. Creating directory '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/output'. Creating directory '/home/hyprshree/Documents/Wi-Se 2024/3d-CV/upstream/3dCV-exercise/output/point_cloud'.
Provided Utility Functions¶
These functions are provided to reduce the complexity of some steps you might encounter. They were involved in the creation of the given samples. However, you do not have to use them and can use other means of achieving the same results.
def sample_image(numpy_image, numpy_sample_grid):
'''
This function samples a target image from a source image (numpy_image) based on specified pixel coordinates (numpy_sample_grid).
Inputs:
numpy_image: of shape=[H, W, C]. H is the height, W is the width, and C is the color channel of the source image from which color values will be sampled.
numpy_sample_grid: of shape=[H, W, UV]. H is the height and W is the width of the target image that will be sampled. UV are the pixel locations in the source image from which to sample color values.
Outputs:
sampled_image: of shape=[H, W, C]. H is the height, W is the width, and C is the color channel of the target image that was sampled.
'''
height, width, _ = numpy_image.shape#[H, W, 3]
#turn numpy array to torch tensor
torch_sample_grid = torch.from_numpy(numpy_sample_grid)#[H, W, 2]
#normalize from range (0, width-1) to (0, 1)
torch_sample_grid[:, :, 0] = torch_sample_grid[:, :, 0] / (width-1)
#normalize from range (0, height-1) to (0, 1)
torch_sample_grid[:, :, 1] = torch_sample_grid[:, :, 1] / (height-1)
#normalize from range (0, 1) to (-1, 1)
torch_sample_grid = torch_sample_grid*2 -1
#transform to necessary shapes
torch_sample_grid = torch_sample_grid.unsqueeze(0)#[1, H, W, 2]
torch_image = torch.from_numpy(numpy_image).double().permute(2, 0, 1).unsqueeze(0)#[1, 3, H, W]
#sample image according to sample grid locations from source image
sampled_image = torch.nn.functional.grid_sample(torch_image, torch_sample_grid, mode='bilinear', padding_mode='zeros', align_corners=True)
#transform back to numpy image
sampled_image = sampled_image.squeeze().permute(1, 2, 0).numpy().astype(np.uint8)#[H, W, 3]
return sampled_image
def ply_creator(input_3d, rgb_data=None, filename='dummy'):
''' Creates a colored point cloud that you can visualise using e.g. Meshlab.
Inputs:
input_3d: of shape=[N, 3], each row is the 3D coordinate of each point
rgb_data(optional): of shape=[N, 3], each row is the rgb color value of each point
filename: file name for the .ply file to be created
'''
assert (input_3d.ndim==2),"Pass 3d points as NumPointsX3 array "
pre_text1 = """ply\nformat ascii 1.0"""
pre_text2 = "element vertex "
pre_text3 = """property float x\nproperty float y\nproperty float z\n"""
if not rgb_data is None:
pre_text3 += """property uchar red\nproperty uchar green\nproperty uchar blue\n"""
pre_text3 += """end_header"""
pre_text22 = pre_text2 + str(input_3d.shape[0])
pre_text11 = pre_text1
pre_text33 = pre_text3
filename = filename + '.ply'
fid = open(filename, 'w')
fid.write(pre_text11)
fid.write('\n')
fid.write(pre_text22)
fid.write('\n')
fid.write(pre_text33)
fid.write('\n')
for i in range(input_3d.shape[0]):
for c in range(3):
fid.write(str(input_3d[i,c]) + ' ')
if not rgb_data is None:
for c in range(3):
fid.write(str(rgb_data[i,c]) + ' ')
if i!=input_3d.shape[0]:
fid.write('\n')
fid.write("\n")
fid.close()
return filename
Step 0: Perceiving the Inputs¶
num_images = given_features.shape[0]
num_features = given_features.shape[1]
print(f"Given Data contains: \n{num_images} Images\n{num_features} Features")
Given Data contains: 9 Images 7 Features
print("\nCamera Calibration:")
for param, value in camera_calibration.items():
print(f"\t{param}: {value}")
Camera Calibration: distortion_param: [-0.1, 0.02, 0.0, 0.0, -0.01] image_height: 551 image_width: 881 principal_point: [275.0, 440.0] focal_length_mm: 25 sensor_width_mm: 35 pixel_ratio: 1.0 pixel_per_mm: 25.17142857142857 focal_length_px: 629.2857142857142
Parameters are extracted and stored for later use. The intrinsic parameters of the camera is used to form the intrinsic camera matrix K
dist_coeffs = camera_calibration['distortion_param']
principal_point = camera_calibration['principal_point']
focal_length_px = camera_calibration['focal_length_px']
w = camera_calibration['image_width']
h = camera_calibration['image_height']
K = np.array([
[focal_length_px, 0, principal_point[1]],
[0, focal_length_px, principal_point[0]],
[0, 0, 1]
])
print(f"\nCamera Movement Matrix Shape: {camera_movement.shape}: ")
print(f"Camera Movement from Frame 0 to 1 (homogeneous) = [R|T]:\n{camera_movement[0]}")
Camera Movement Matrix Shape: (8, 4, 4): Camera Movement from Frame 0 to 1 (homogeneous) = [R|T]: [[ 0.70710678 -0.1830127 0.6830127 -3. ] [ 0.1830127 0.98037987 0.0732233 -0.25881905] [-0.6830127 0.0732233 0.72672691 0.96592583] [ 0. 0. 0. 1. ]]
The camera movement matrix shape (8,4,4) indicates there are 8 camera movements. This of course aligns with the number of views given(9). Each movement is described by a rotation and translation.
print(f"Given 2D Features: {given_features.shape}")
print(f"Features in view 0(y,x): \n{given_features[0]}")
print(f"Coordinates of feature 0 in each view (y,x): \n{given_features[:, 0]}")
Given 2D Features: (9, 7, 2) Features in view 0(y,x): [[163 616] [431 593] [380 672] [164 660] [378 462] [280 422] [274 650]] Coordinates of feature 0 in each view (y,x): [[163 616] [192 722] [252 536] [ -1 -1] [168 693] [285 481] [187 650] [ -1 -1] [472 485]]
The shape of the given features matrix indicates there are 9 matrices, one per view. Each view matrix has 7 features denoted by x,y cordinates hence giving it the shape (9,7,2). [-1, -1] Indicates the feature is not in the scene
Visualizing all the views and their corresponding depth maps¶
plt.figure(figsize=(20,45))
for i in range(num_images):
img_path_i = os.path.join(img_path, f'{i:02d}.png')
depth_path_i = os.path.join(depth_path, f'{i:02d}.png')
img_i = cv.imread(img_path_i)
depth_i = cv.imread(depth_path_i, cv.IMREAD_UNCHANGED)
plt.subplot(num_images, 2, 2 * i + 1)
plt.imshow(depth_i, cmap='gray')
plt.title(f'Depth {i:02d}')
plt.axis('off')
plt.subplot(num_images, 2, 2 * i + 2)
plt.imshow(cv.cvtColor(img_i, cv.COLOR_BGR2RGB))
plt.title(f'Image {i:02d}')
plt.axis('off')
plt.tight_layout()
plt.show()
Visualizing the features¶
plt.figure(figsize=(80, 50))
for i in range(num_images):
plt.subplot(3, 3, i + 1)
img_path_i = os.path.join(img_path, f'{i:02d}.png')
img_i = cv.imread(img_path_i)
img_i = cv.cvtColor(img_i, cv.COLOR_BGR2RGB)
plt.imshow(img_i)
features_i = given_features[i]
for enum, feature in enumerate(features_i):
if feature[0] != -1 and feature[1] != -1:
plt.plot(feature[1], feature[0], 'ro',markersize=30)
plt.title(f'Image {i:02d} with Features')
plt.axis('off')
plt.tight_layout()
plt.show()



